Goto

Collaborating Authors

 Hayward


Generative Modeling of Discrete Data Using Geometric Latent Subspaces

Gonzalez-Alvarado, Daniel, Cassel, Jonas, Petra, Stefania, Schnörr, Christoph

arXiv.org Machine Learning

We introduce the use of latent subspaces in the exponential parameter space of product manifolds of categorial distributions, as a tool for learning generative models of discrete data. The low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorial variables. We equip the parameter domain with a Riemannian geometry such that the spaces and distances are related by isometries which enables consistent flow matching. In particular, geodesics become straight lines which makes model training by flow matching effective. Empirical results demonstrate that reduced latent dimensions suffice to represent data for generative modeling.


A tensor network formalism for neuro-symbolic AI

Goessmann, Alex, Schütte, Janina, Fröhlich, Maximilian, Eigel, Martin

arXiv.org Machine Learning

The unification of neural and symbolic approaches to artificial intelligence remains a central open challenge. In this work, we introduce a tensor network formalism, which captures sparsity principles originating in the different approaches in tensor decompositions. In particular, we describe a basis encoding scheme for functions and model neural decompositions as tensor decompositions. The proposed formalism can be applied to represent logical formulas and probability distributions as structured tensor decompositions. This unified treatment identifies tensor network contractions as a fundamental inference class and formulates efficiently scaling reasoning algorithms, originating from probability theory and propositional logic, as contraction message passing schemes. The framework enables the definition and training of hybrid logical and probabilistic models, which we call Hybrid Logic Network. The theoretical concepts are accompanied by the python library tnreason, which enables the implementation and practical use of the proposed architectures.


Bayesian Empirical Bayes: Simultaneous Inference from Probabilistic Symmetries

Wu, Bohan, Weinstein, Eli N., Blei, David M.

arXiv.org Machine Learning

Empirical Bayes (EB) improves the accuracy of simultaneous inference "by learning from the experience of others" (Efron, 2012). Classical EB theory focuses on latent variables that are iid draws from a fitted prior (Efron, 2019). Modern applications, however, feature complex structure, like arrays, spatial processes, or covariates. How can we apply EB ideas to these settings? We propose a generalized approach to empirical Bayes based on the notion of probabilistic symmetry. Our method pairs a simultaneous inference problem-with an unknown prior-to a symmetry assumption on the joint distribution of the latent variables. Each symmetry implies an ergodic decomposition, which we use to derive a corresponding empirical Bayes method. We call this methodBayesian empirical Bayes (BEB). We show how BEB recovers the classical methods of empirical Bayes, which implicitly assume exchangeability. We then use it to extend EB to other probabilistic symmetries: (i) EB matrix recovery for arrays and graphs; (ii) covariate-assisted EB for conditional data; (iii) EB spatial regression under shift invariance. We develop scalable algorithms based on variational inference and neural networks. In simulations, BEB outperforms existing approaches to denoising arrays and spatial data. On real data, we demonstrate BEB by denoising a cancer gene-expression matrix and analyzing spatial air-quality data from New York City.


Comparing Two Proxy Methods for Causal Identification

Guo, Helen, Ogburn, Elizabeth L., Shpitser, Ilya

arXiv.org Machine Learning

Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors composing counterfactual quantities by exploiting unique determination of eigenspaces. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.





Theory and computation for structured variational inference

Sheng, Shunan, Wu, Bohan, Zhu, Bennett, Chewi, Sinho, Pooladian, Aram-Alexandre

arXiv.org Machine Learning

Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have interdependent structure. We consider the natural setting of star-structured variational inference, where a root variable impacts all the other ones. We prove the first results for existence, uniqueness, and self-consistency of the variational approximation. In turn, we derive quantitative approximation error bounds for the variational approximation to the posterior, extending prior work from the mean-field setting to the star-structured setting. We also develop a gradient-based algorithm with provable guarantees for computing the variational approximation using ideas from optimal transport theory. We explore the implications of our results for Gaussian measures and hierarchical Bayesian models, including generalized linear models with location family priors and spike-and-slab priors with one-dimensional debiasing. As a by-product of our analysis, we develop new stability results for star-separable transport maps which might be of independent interest.


Copula-Stein Discrepancy: A Generator-Based Stein Operator for Archimedean Dependence

Aich, Agnideep, Aich, Ashit Baran

arXiv.org Machine Learning

Kernel Stein discrepancies (KSDs) have become a principal tool for goodness-of-fit testing, but standard KSDs are often insensitive to higher-order dependency structures, such as tail dependence, which are critical in many scientific and financial domains. We address this gap by introducing the Copula-Stein Discrepancy (CSD), a novel class of discrepancies tailored to the geometry of statistical dependence. By defining a Stein operator directly on the copula density, CSD leverages the generative structure of dependence, rather than relying on the joint density's score function. For the broad class of Archimedean copulas, this approach yields a closed-form Stein kernel derived from the scalar generator function. We provide a comprehensive theoretical analysis, proving that CSD (i) metrizes weak convergence of copula distributions, ensuring it detects any mismatch in dependence; (ii) has an empirical estimator that converges at the minimax optimal rate of $O_P(n^{-1/2})$; and (iii) is provably sensitive to differences in tail dependence coefficients. The framework is extended to general non-Archimedean copulas, including elliptical and vine copulas. Computationally, the exact CSD kernel evaluation scales linearly in dimension, while a novel random feature approximation reduces the $n$-dependence from quadratic $O(n^2)$ to near-linear $\tilde{O}(n)$, making CSD a practical and theoretically principled tool for dependence-aware inference.


Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring

Wang, Yuxin, Frauen, Dennis, Schweisthal, Jonas, Schröder, Maresa, Feuerriegel, Stefan

arXiv.org Machine Learning

Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons. When dropout is informative (i.e., dependent on survival time), it introduces censoring bias, because of which treatment effect estimates are also biased. In this paper, we propose an assumption-lean framework to assess the robustness of conditional average treatment effect (CATE) estimates in survival analysis when facing censoring bias. Unlike existing works that rely on strong assumptions, such as non-informative censoring, to obtain point estimation, we use partial identification to derive informative bounds on the CATE. Thereby, our framework helps to identify patient subgroups where treatment is effective despite informative censoring. We further develop a novel meta-learner that estimates the bounds using arbitrary machine learning models and with favorable theoretical properties, including double robustness and quasi-oracle efficiency. We demonstrate the practical value of our meta-learner through numerical experiments and in an application to a cancer drug trial. Together, our framework offers a practical tool for assessing the robustness of estimated treatment effects in the presence of censoring and thus promotes the reliable use of survival data for evidence generation in medicine and epidemiology.